ZDOCj Architecture

For years, there was no freeware solution for editing DOC format files on Palm.  Users were limited to the 4096 character MemoPad files, unless they wanted to pay for commercial software to edit larger files.  That all changed with the introduction of ZDOCm.  My most heartfelt gratitude goes out to "Zurk", the original author of ZDOC reader, and "Mizotec", who made ZDOCm, which included editing capabilities.  I seek to extend their work.

There is an excellent shareware out there called QED, for reading and editing Palm DOCs.  It's relatively low cost, and the full functionality is available indefinitely as a trial.  I have no intention of making this program as good as that one.  My target user is one with very simple editing needs, (except the need to work with large documents), who cannot in good conscience use QED without paying.

After I had already started on this program, I discovered that Ben Roe had released SiEd, a freeware for editing DOCs, with a rich feature set.  I don't know too much about it, because I only have a PalmOS 3.1, and SiEd requires 3.5.  The feature list is impressive, though.

The goal of ZDOCj is to hide the record structure from the user.  This means that the precise control over records possible with ZDOCm will be lost.  But hopefully, the program will be simpler to use for the casual user.  Ideally, the editing experience will be very similar to using Memo Pad.

The basic idea: A Field buffer is allocated in volatile RAM memory.  Records of size not to exceed 4k each exist in the database (I will loosely call this a "file") in Flash memory (I will loosely say "on disk").  Think of the Field buffer as a FIFO or queue for disk data.  Some data is read into the Field - it may be part of a record, or may span several contiguous records.  The program keeps track of the start and end of that data in on disk.  For example, the Field buffer may start at record 6, offset 193, and end at record 9, offset 38.  These numbers are invisible to the user.  The data in the field buffer always represents consecutive records on the disk, so it is understood that records 7 and 8 are entirely contained in the Field.

When the user scrolls the screen down, more data is fetched from disk, and the pointers for start and end are updated.  In the example above, if 1000 more bytes were desired, and fetched from record 9, the new end would be at record 9, offset 1038.

If there was enough room in the Field buffer for the extra 1000 bytes in the example above, well and good.  However, if there is not enough space, then the program must first make room, by posting (writing) data at the top of the field buffer out to the corresponding place on disk.  1000 bytes at the beginning of the buffer is compared to its equivalent data on disk.  If it is not the same, the disk will be updated.  If the data is the same, the program doesn't bother to update the disk.  The 1000 bytes is then deleted from the Field, and the pointer for start of field is updated, perhaps to record 6, offset 1193.

Similar actions occur upon scrolling up, jumping to another part of the file, saving, etc.

Keeping the Field buffer filled and posting information to the disk when necessary are the primary challenges of the program.  Because posting should be done in an efficient manner (only write when we have to), it is considerably more complicated than fetching data from disk.  Some of the general philosophy I plan to use as guidelines:

1) If a whole record matches its corresponding text in the field, do not touch the record, even if it is small and would have room for more.  In such a case, most likely the next record will match the field, too, but writing extra data to the first record will mess up this synchronization.

2) If a record matches the field text up to the amount we are attempting to post to disk, then do not touch the record, even if it contains bytes that don't match beyond the posting region of interest.  These bytes could change later.

3) If a record matches the field text up to the amount we are trying to post to disk, see if the record also matches the field text up to the end of the record.  If it does, then there's no need to update the disk, but update the Field Head or Tail pointers as if the whole record has been posted.  On the assumption that 4k is by far the most common text size in a record, and 4k is the most common request for posting, this gives the best chance of needing to update only one record the next time, instead of two.

4) If a record must be written anyway because the field text is different from the record text, then if enough text is available in the field, fill out the whole record to maximum size.  By guiding towards writing full records, I hope to perform a primitive coalescence of tiny records.

Design Goals:
I must first understand how the existing ZDOCm works.  If I can't even understand that, how can I expect to extend it without making similar mistakes?  I will document routines as I go.  Commenting is sparse; the author did not have English as a first language.
Reading a record from Flash is fast (faster than a hard disk!), but writing is slow, so avoid or delay writing whenever possible.  If no modifications have been made to the field, avoid writing.  Even if modifications have been made, do a compare to disk before writing, to avoid overwriting identical data.  Have no fear of the overhead of reading lots of tiny records.  It probably won't happen, anyway.  Note: the delay in writing is slightly from the performance of FLASH memory, itself, but largely from executing the compression algorithm.  Reading a compressed 4k took about 30 mS, writing took 50 mS.  If full compression is used, it presently takes on the order of 27000 mS to write.  The other nasty thing about re-writing a compressed record is that the quick compress will generally generate a larger record, ballooning the file on disk.
Continuous display, regardless of how large or small the underlying records are.
Release under GNU license.  Obviously, since it's an extension of GNU source.
Search entire file as a monolithic entity (from the user standpoint); successfully find a string even if it crosses record boundaries.
Take advantage of larger memory when available, by allocating a larger buffer for the Field for editing.
Add whole lines of data when possible.  Or just make it so that user never sees a partial line.
Take advantage of the native editing capabilities of the Field type control on the Palm.
Emphasize programming structure over performance.  If the job can be done with less lines of code, I will generally choose that way, rather than add code to increase performance.  I will try to write routines with consistent naming and structure.  I hope to thus facilitate both debugging and testing - a bug found in one routine can be fixed in its parallel routines.  For example, the routine to fetch n bytes and prepend to the display should have the same template as the routine to fetch n bytes and append to the display.
Concessions to performance: 1) Use a "dirty" Boolean to remember whether the user modified any of the buffer.  Then, don't even need to compare when posting data to disk.  2) When scrolling large distances, jump to records.  For example, if the request is to scroll to the 75% point of a file of 100 records, then jump to record 75.  This won't be totally precise if all the records aren't the same size, but I'm willing to accept the approximation.  Most of the records written by the program will be decent-sized, and certainly, content coming from a PC generally uses full 4096-byte records.
Possibility that program will run under low RAM conditions.  After working on larger computers, I am often tempted to save a value in RAM which could be recomputed at a later time.  Unless there is a huge gain in efficiency, I won't bother doing this.  For example, there's a temptation to keep a table of all the record sizes around, but that would take an array as large as 64k integers.
Document size limited only by available Flash on the PDA to hold it, and the DOC format, itself.  DOC allows for about 64k records of 4k each, or 256 MB.  This is very large for text-only documents.  The Lilac Fairy Book is about 500 kB; the King James Bible is only about 4 MB, Encyclopedia Britannica is only about 1 GB.

NOT goals:
No plans to work on a Palm OS 1.0.  I like my StrPrintF.  But hope to work on 2.0.  MUST work on 3.1HS, because that's my PDA.
No heroics to keep the underlying record structure clean.  I will try not to fragment the file too much, but will make no effort to make sure that every possible record is packed with as many bytes as possible.
Making the program performance so good that the user won't notice when records are being posted to the database.  If it happens, well and good.
Speeding up compression or decompression performance.  I will advise the user that if they need to edit, they should not be using compression.  However, I will offer as a default, a "Quick Compress".  This will be compatible with DOC version 2 compression, but will not do the string replacement algorithm, which takes up most of the time in compression.  This won't offer much compression, and may even expand the file, in some cases, but will allow the user to read and write compressed files with reasonable performance.  In the event that quick compression would result in a record larger than 4096 bytes, the program will put up a warning, and revert to standard compression.
This is obviously not a programmer's editor.  No helpful features for dealing with code.
This is not intended for heavy text editing.  To the extent that someone would want to edit on a Palm in the first place, there are commercial programs for sale.
No intention of tuning search performance.  If someone plans to search the Bible, they had better buy a commercial Bible program.  I might do a partial Boyer-Moore just to see if I can do it.  I suspect most of the search time will be in simply reading and posting records to disk.  Can profile the possible advantage by stubbing out the string compare routine, and seeing how long it takes to traverse the file.

Might be good, might be hard:
Point and Mark ala Emacs
Use custom font for tiny icons - no, no good for palmos 2.0.  Only option would be to draw bitmaps under blank buttons (no bitmap buttons supported).  No, too much trouble for too little gain.  Use standard character set. Besides, who wants a ton of icons on a Palm?

- Roderick Young; 21-JUL-2004.

Notes during implementation:

DOC programs hate to have a completely empty DOC, that is, a document with nothing but the header record.  I will therefore put some text in, even if the user tries to save a blank file.  It was easy enough for ZDOCj to handle an empty DOC, but this is out of consideration for other programs.

Found out that Palm does not let you write a database record with length 0.  Too bad, the code becomes inelegant to deal with this.

Found out that Palm does not let you replace the selection with a null string.  Too bad, this made the code inelegant, too.

Found out that if you get the text handle of a field, then set the text handle of the field to NULL, then restore the original text handle of the field, the insert point and scroll position may be messed up, even if you never modified any of the text owned by the handle.  Must always restore the scroll position, insert position and selection after such an operation.

I use a structure PosType to represent the position in the file, as well as what is displayed on the screen.  Many of the variables of type PosType use only a subset of the structure elements.  When fully populated, a PosType structure contains enough information to restore an exact position in the file to the display, for example, if the application is exited and re-entered.

One particular variable, goInfo, is of interest.  This contains the present insertion point, selection, and scroll position of the displayed field.  Any function called by the event handler can assume that these values are valid upon entry.  Any function which changes the screen must update these values before returning to the event handler.  The user can modify these, of course, so when a nilEvent is received (that is, when we have nothing better to do), the values of goInfo are updated, if needed.  Also, functions which require accurate selection will query the insert position and selection themselves.

One of the things that really stinks about Palm OS, at least the early versions with which I must be compatible, is the inability to send a message directly to another form for immediate processing.  One would think that something like FrmUpdateForm() would send a message to the targeted form only, but instead, the message is just placed into the general queue, and picked up by the active form.  What I've had to do to communicate between forms is queue the message, then immediately exit the present form.  In Windows, even the old 2.0 single-threaded Windows, there was a SendMessage() function.

Messages.  I have to keep them small to fit on the Palm screen.  Informational messages will just go to a status area, and if they get overwritten, too bad.  Warning messages will blink, and stay up for at least a second.  Any other errors had better be so bad, that it's necessary to stop the program.  For these, I'll put up an Alert, hopefully with detailed text, and require soft reset.  I think "Out of Memory" is about the only really bad one I can think of.  In general, use StatusMsg to let user know information, warnings, and errors.  Blinking the message takes time, so don't blink for informational or common warning messages.  Sometimes, I might blink a rare, unexpected warning, even if it is inconsequential.  Serious errors always blink.

Fonts.  I had a request, after version 1.1, from a user that wanted to support a tiny font.  So I included a custom font, which will only be good for hi-res 320x320.  Found out that there is no good way to enumerate all the fonts on the PDA.  I used FntSetFont() to set font n, then FntGetFont() to see if it "took".  If the font does not exist, my hope is that FntGetFnt() will now be the default font, instead.  This isn't perfect.  All Palms except for OS 6.x seem to have fonts 128, 129, 130, 131 in place, even though no custom fonts are loaded.  Also, passing a trial value of n to FntSetFont() creates a debugger break if n is not a valid FontID.  Makes debugging tedious.

In my EventLoop, I call EvtGetEvent() like most programs.  I needed to get the nilEvent once in a while, but found that I got it even if I set the timeout to EvtWaitForever.  At least, this was true up to OS 5.x .  On OS 6.x, it appears that I may need to set an actual timeout.  Too bad; I think this inhibits the processor going to sleep, and increases power dissipation.

Resizing and Dynamic Input Area support.  Thank goodness Dr. Alexander Pruss put together a package to make it easy to adapt any program for Hi-res.  The package can be found at http://palmresize.sf.net .  The files incorporated herein are: config.h, DIA.h, resize.h resizeconst.h, DIA.c, resize.c.  Please address comments and suggestions for improvement to Dr. Pruss at the website.

Release history:

Version 0.7 posted to Freewarepalm, PalmGear, and Tucows around Oct 10, 2004.  The response has been underwhelming; no one has responded.  I don't really have more features I plan to add, but was hoping for some bug reports.

Version 0.8 fixed the compression algorithm so that 0x09 (tab) is passed, rather than considered a quote-count.  This doesn't match the pyrite.org spec, but is defacto what other programs seem to do.  Fixed a bug such that when you cancelled out of a running search, a flag was set permanently, and you could never search again.  Also fixed some things which only affected the Debug ROM, like drawing to a form before it is initialized.

Version 0.9 never released.

Version 1.0 went out about Thanksgiving, 2004.  It fixed two hangs, and incorporated user requests to sort the document list, and stop the annoying beep when showing the time.

Version 1.1 went out at the end of January, 2005.  All it did was fix the a bug of forgetting to refresh the dropdown file list when a file was renamed.

Version 1.2 was the first one compiled under Palm OS Development Suite. It has not been released, yet.  Some of the bugs fixed therein were due to a change in Cobalt, or deficiencies in the development environment, itself. These are not mentioned in the release notes, since they didn't apply to previous releases. This version added a resizing support, and a tiny font for use under High Resolution. It also added an option to have a field with no underlines. The font selector changed, for access to more fonts. Some button and menu text was changed to more standard characters that will display even under a Japanese OS.

Version 1.6 coalesces records when writing.  The previous goal of not re-writing a record that is already good is abandoned.
